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Abstract — The fidelity of radio astronomical images is generally 
assessed by practical experience, i.e. using rules of thumb, 
although some aspects and cases have been treated rigorously. 
In this paper we present a mathematical framework capable 
of describing the fundamental limits of radio astronomical 
CD imaging problems. Although the data model assumes a single 
1— < snapshot observation, i.e. variations in time and frequency are 
CD not considered, this framework is sufficiently general to allow 
extension to synthesis observations. Using tools from statistical 
5^ ' signal processing and linear algebra, we discuss the tractability 
d [ of the imaging and deconvolution problem, the redistribution of 
, noise in the map by the imaging and deconvolution process, the 
covariance of the image values due to propagation of calibration 
_™ ' errors and thermal noise and the upper limit on the number 
' of sources tractable by self calibration. The combination of 
covariance of the image values and the number of tractable 

i 1 sources determines the effective noise floor achievable in the 

' imaging process. The effective noise provides a better figure of 
' merit than dynamic range since it includes the spatial variations 
of the noise. Our results provide handles for improving the 
imaging performance by design of the array. 

Index Terms — radio astronomy, imaging, deconvolution, noise, 
dynamic range 
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I. Introduction 

The radio astronomical community is currently building or 
developing a number of new instruments such as the low 
frequency array (LOFAR) [lj, the square kilometer array 
(SKA) and the Mileura wide field array (MWA) 0. 
Imaging and self calibration of these radio telescopes will 
be computationally demanding tasks due to the large number 
of array elements. Much research is therefore focused on 
finding clever short-cuts to reduce the amount of processing 
required, such as w-projection [4j or facet imaging [5| and 
different variants of CLEAN [6|. The validity and quality of 
these methods is generally assessed by practical experience. 
Attempts to do a rigorous analysis are done for some aspects 
and cases ll71l- lfT0ll . but most of the time rules of thumb are 
used. This paper presents the first comprehensive mathematical 
framework capable of describing the fundamental limits of 
radio astronomical imaging problems. The data model used 
in this paper applies to snapshot observations, i.e. variations 
in time and frequency are not considered. However, using 
a multi-measurement data model such as those in ifTTI — lTT3l 
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it is straightforward to extend the data model to synthesis 
observation and still apply the framework described herein. 

The resolution of the final image (or map) is normally 
determined by the size and configuration of the array and the 
spatial taper function. Under the assumption that the sky is 
mainly empty, i.e. that the image contains only a few sources, 
maps with higher resolution than predicted by the array 
configuration (superresolution) can be made using CLEAN. 
The Maximum Entropy Method (MEM) [14| imposes a similar 
consttaint by aiming for a solution that is as featureless as 
possible. In the array processing literature, superresolution 
is achieved by high resolution direction of arrival (DOA) 
estimation techniques such as MUSIC [15| and weighted 
subspace fitting 1161 . flTl . In all these approaches the goal 
is to disentangle the spatial response of the array and the 
source structure, a process called deconvolution. In section Hill 
we formulate imaging as an estimation problem, an approach 
called model based imaging, and obtain an analytic expression 
for its least squares solution that allows us to formulate 
the deconvolution problem as a matrix inversion problem. 
This provides a powerful tool to assess the tractability of 
the deconvolution problem and to demonstrate the impact 
on the array configuration on the deconvolution problem and 
the redistribution of noise in the imaging and deconvolution 
process. 

The dynamic range of an image is generally defined as 
the power ratio between the strongest and the weakest mean- 
ingful features in the map. In practice, the limitations of an 
instrument are more conveniently described by the achievable 
noise floor in an imaging observation since the dynamic range 
strongly depends on the strength of the strongest source within 
the field-of-view and because the noise varies over the map. 
This noise floor is a combination of calibration errors, thermal 
noise and confusion noise. In this paper the term "effective 
noise" refers to the net result of these constituents in the 
image plane. In section [IV] analytical expressions are derived 
that describe the components of the effective noise in terms 
of the covariance of the image values, a concept which we 
will refer to as image covariance. The consequences of these 
expressions are illustrated with a few examples in section [V] 
These examples suggest that the contribution of propagated 
calibration errors to the image covariance is considerably 
smaller than the contribution of thermal noise even if the 
calibration is done on data with similar SNR. They also 
indicate that self calibration causes higher covariance between 
source power estimates than pure imaging does. 

Notation: Overbar (•) denotes complex conjugation. The 
transpose operator is denoted by T , the complex conjugate 
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(Hermitian) transpose by and the Moore-Penrose pseudo- 
inverse by t. The expectation operator is denoted by 8 {•}, 
is the element-wise matrix multiplication (Hadamard product), 
(•) ™ is used to denote the element- wise matrix exponent with 
exponent n, ® denotes the Kronecker product and o is used to 
denote the Khatri-Rao or column-wise Kronecker product of 
two matrices, diag(-) converts a vector to a diagonal matrix 
with the elements of the vector placed on the main diagonal, 
vec(-) converts a matrix to a vector by stacking the columns 
of the matrix and vecdiag(-) converts the main diagonal of 
its argument to a column vector, circulant(-) creates a square 
circulant matrix by circularly shifting the entries of its vector 
argument to form its columns. © will be used to denote 
circular convolution of two vectors, i.e., for vectors of length 
n, (x ® y)j = J2i=o ZiVj-imodn- 

For matrices and vectors of compatible dimensions, we will 
frequently use the following properties: 

vec(ABC) = (C T ® A)vec(B) (1) 

vec(Adiag(b)C) = (C T o A)b (2) 

(AoB) B (CoD) = A ff C0B H D (3) 

(A«)B)(CoD) = ACoBD (4) 

II. Data model 

Consider a phased array consisting of p sensors (an- 
tennas). Denote the baseband output signal of the ith ar- 
ray element as Xi(t) and define the array signal vector 
x(<) = [xi(t), X2(t), ■ ■ ■ ,x p (t)] T . We assume the presence 
of q source signals Sk(t) impinging on the array. These are 
assumed to be mutually independent i.i.d. Gaussian signals, 
and are stacked in a q x 1 vector s(t). Likewise the sensor 
noise signals n.i(t) are assumed to be mutually independent 
Gaussian signals and are stacked in a p x 1 vector n(t). We 
assume that the narrowband condition holds fl8l . We can then 
describe, for the fcth source signal, the phase delay differences 
over the p receiving elements due to the propagation geometry 
by a p-dimensional spatial signature vector a^. The q spatial 
signature vectors are assumed to be known (known source 
locations and array geometry). 

The sensors are assumed to have the same direction de- 
pendent gain behavior which is described by gain factors gofc 
towards the q source signals received by the array. These can 
be collected in a matrix Go = diag([goij go2, • ■ ■ >90q])- The 
direction independent gains and phases can be described as 
7 = [7i , 72 , • • • , 7 P ] T and 4> = [e^ 1 , e^ 2 , • • • , e^] T respec- 
tively, with corresponding diagonal matrix forms T = diag(7) 
and <fr = diag(</>). With these definitions, the array signal 
vector can be described as 




where A = [ai, • • • , a*,] (size p x q) and G = r<&. 



The signal is sampled with period T and N sam- 
ple vectors are stacked into a data matrix X = 
[x(T),x(2T), • • • ,x(NT)}. The covariance matrix of x(t) is 
R = £{x(i)x ff (t)} and is estimated by R = A^XX^. The 



number of samples N in a snapshot observation is equal to 
the product of bandwidth and integration time and typically 
ranges from 10 3 (1 s, 1 kHz) to 10 6 (10 s, 100 kHz) in 
radio astronomical applications. Likewise, the source signal 
covariance E s = diag(er s ) where <j s = \a\ x , <rf 2 , • • ■ ,a 2 q ] T 
and the noise covariance matrix is S„ = diag(er„) where 
0n = [&ni> a n2' ' ' ' > <J n P \ T ■ Then the model for the covari- 
ance matrix for a snapshot observation R based on (0 is 

R=GAG S s G^A H G fl + S, i . (6) 

If the directional response of the antennas is known, Go 
can be absorbed in A. If Go and S s are both unknown, we 
can introduce 

S = GoS s Go ff 

= diag([|3oi| 2 CT s i, • • • , \g 0q \ 2 <J sq ]) = diag(cr) (7) 

with real valued elements cr — \o\,a\,--- ,a 2 ] T . We may 
then restate © as 

R = GASA H G ff + S n . (8) 

The ith element of the sensor array is located at y.- l = 
[xi, Hi, Zi] T . These positions can be stacked in a matrix 
1Z = [ri, V2, ■ ■ ■ , r p ] T (size p x 3). The position of the kth 
source can be denoted by the unit vector 1^ = [lk,mk,n,k] T ■ 
The source positions can be stacked in a matrix C = 
[li , I2 , • • • ,lg] T (size q x 3). The spatial signature matrix A 
can thus be described by 

A = cxp(-j^£ T ) (9) 

where the exponential function is applied element-wise to its 
argument. In the remainder of this paper we will specialize to 
a planar array having Zi — for convenience of presentation 
but without loss of generality. 

III. Imaging and deconvolution 

A. Beam forming versus model based imaging 

The imaging process transforms the covariances of the 
received signals (called visibilities in radio astronomy) to an 
image of the source structure within the field-of-view of the 
receivers. In array processing terms, it can be described as 
follows ifTTIl . To determine the power of a signal received from 
a particular direction (I, m, n), a weight vector 

w = (sJ) H = cxp f -j— K[l,m,n] T J 

= icxp ^-^TZ[l,m,n} T ^j (10) 

is assigned to the array signal vector x(t). The operation 
y(t) = w H x(t) is generally called beamforming and can be 
regarded as a spatially matched filter. Equation dTOb represents 
the most basic beamformer that assumes the presence of only 
a single source and only corrects the signal delays due to 
the array geometry. These weights can be adapted to correct 
the complex gain differences between the receiving elements 
G derived from calibration measurements |[T9"1 . nulling of 
interfering sources [12] and spatial tapering of the array [20|. 
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Fig. 1. (a) Image obtained by normal imaging without deconvolution as 
in )1U . showing the sources and their side lobe patterns, (b) Image obtained 
by model based imaging as in J22K which estimates the power at every pixel 
simultaneously, resulting in a deconvolved image showing only the sources 
without the array response. 



The image value at (I, m, n) is equal to the expected output 
power of the beamformer when pointed into that direction, and 
can be computed directly from the array covariance matrix R 
as 

?(i,m,n)=w H Rw. (11) 

For weights defined as in ( [Tol l, this is known as direct Fourier 
transform imaging. To create an image, w is scanned over all 
relevant (I, m, n). The required weights can be stacked into a 
single matrix W. Since w ff Rw = (w® w) ff vec(R), we can 
stack all image values in a single vector i and write 



i = (Wo W) ff vec(R). 



(12) 



If we only want to image at the source locations, we have 
W = i A. A typical model assumption is that there is a source 
present at every pixel location, in which case 



1 



isF = -j(Ao A)^vcc(R). 



(13) 



This is the classical dirty image. 

Let us assume momentarily that G = I and S„ = 0. 
Inserting the data model (0, or vec(R) = (A o A)er, into 



(fT3] l gives 



1 



£{i BF } = — (A o A)" (A o A)er 
i(A fl A0A fl A),T 

pZ 



(14) 



This shows that the dirty image is not equal to the true source 
structure. To understand the physical meaning of this term, 
consider the product a^a^, where the indices i and j refer 
to the respective columns of A. Using (0 this can be written 
explicitly as 

( -2^A H 

ex P -j-r^ij ex P 



a, a, 



2tt 



E 



exp 



A 



(15) 



The physical interpretation of the inner product between the 
two spatial signature vectors is that it measures the sensitivity 
of the array to signals coming from direction lj while the 
array is steered towards lj. The product A H &j thus describes 
the array sensitivity for all directions of interest stacked in 
C when pointed to lj. It therefore provides the array voltage 
response or array voltage beam pattern centered around Lj, 

by (lj) = A H &j. (16) 

With A defined as in (O, this shows that the voltage beam 
pattern is just the Fourier transform of the spatial weighting 
function resulting from the array configuration and the weight- 
ing of the array elements. The corresponding power beam 
pattern can be calculated as 



bp (l,-) = b v (Ij)Qbv (1,) 



A a, 



A H a 



(17) 



The factor A A0A fl A in (|T4]> can thus be interpreted as a 
convolution by the Fourier transform of the spatial distribution 
of baseline vectors, which is known as the array beam pattern 
or dirty beam H21L 

This effect is illustrated in Fig. [Tf a )- This image is the 
result of a simulated observation with an 8 x 8 half wavelength 
spaced (i.e., spatially Nyquist sampled) 2D uniform rectangu- 
lar array (URA). The grid of image values on the sky is taken 
such that the first Nyquist zone is appropriately sampled. The 
underlying source model contains four sources at grid points 
(-0.33, -0.6, 0.73), (-0.2, -0.6, 0.77), (0.6, -0.2, 0.77) and 
(0.87,0.2,0.46) respectively and er = [1, 0.6, 1.3, 0.1] T . This 
source and array configuration will be used throughout this 
paper unless stated otherwise. The map in Fig. [Tfa) clearly 
shows these four (or three, if one regards the two sources on 
neighboring grid points as a single extended source) being 
convolved with the array beam pattern. 

Following a model based approach, the deconvolution prob- 
lem can be formulated as a maximum likelihood (ML) esti- 
mation problem, that should provide a statistically efficient 
estimate of the parameters. Since all signals are assumed to 
be i.i.d. Gaussian signals, the derivation is standard and the 
ML estimates are obtained by minimizing the negative log- 
likelihood function l22l 



argmin (in |R(«x)| 

(7 \ 



tr R" 1 (cr)R 



(18) 



VERSION 1 1 JUNE 2008 



4 



It does not seem possible to solve this minimization prob- 
lem is closed form, but a weighted least squares covariance 
matching approach is known to lead to estimates that are, for 
a large number of samples, equivalent to ML estimates and 
therefore asymptotically efficient [22]. The problem can thus 
be reformulated as 

er = argmin W c — E n ) W c — 
W r GASA ff G fl W f ""' 



argmin 

(7 



(Wc ® W c ) vec (R - S n ) 



(19) 



(W C GA) o (W C GA) o- 

The solution is given by 

a = ((W C GA) o (W c GA)) f (W c <g) W c ) vec (R - E„ 

(2D) 

Optimal weighting is provided by W c = R 2 . Since radio as- 
tronomical sources are generally very weak with the strongest 
source in the field having an instantaneous SNR in the order 
of 0.01, we can introduce the approximation R s» c 2 I for an 
array of identical elements for convenience of notation. This 
reduces d20T > to 



= (GA o GA) vec R — E n 



(21) 



One may argue that this requires one to know where the 
sources are before doing the imaging. This is generally solved 
by simultaneously estimating the source locations and source 
powers. Although the CLEAN algorithm has not yet been fully 
analyzed, it can be regarded as an iterative procedure to do 
this ifTTI . It is instructive, however, to use (fJTJ for imaging 
by estimating the power on every image point (pixel), i.e., by 
assuming a data model with a source present at every pixel. 
We can simplify (l2"TT i by replacing the Moore-Penrose pseudo- 
inverse by the left pseudo-inverse, to obtain the image vector 



GAo GA GAo GA 



(GA o G A) H vec (R - E n ) 



= (A ff r 2 A©A ff r 2 A 



H 



(GA o GA) vec (R — E 



(22) 



The first factor in this equation represents the deconvolution 
operation. It is therefore convenient to introduce the deconvo- 
lution matrix M = A^T 2 A© A H T 2 A = \A H T 2 A\° 2 . This 
provides a powerful check on the sampling of the image plane. 
If the image plane is oversampled, i.e., if too many image 
points are defined, this matrix will be singular. This property 
demonstrates that high resolution imaging is only possible if 
a limited number of sources is present, i.e., if the number 
of sources is much smaller than the number of resolution 
elements in the field-of-view. The condition number of the 
deconvolution matrix, which provides a measure on the mag- 
nification of measurement noise, is discussed in more detail 
in Sec. IIII-CI This mostly empty field-of-view is commonly 



assumed in astronomical imaging and this assumption is one 
of the reasons why CLEAN and MEM work in practice. Fig. 
|TJ&) shows the image obtained by applying (1221 to the 8x8 
URA. Comparison with the image obtained using (TT~3T > clearly 
shows the effectiveness of the model based imaging approach 
in suppressing the array beam pattern. 



B. Noise redistribution 

If imaging is done without deconvolution by using ( fT3l ). 
the thermal noise adds a constant value to all image values. 
This can be illustrated by assuming that £ |r| = E n , i.e. by 
assuming that the image is completely dominated by thermal 
noise. The expected value of the image then becomes 



\ jj 

ibf = ^j(AoA) vec(E„) 

= \(AqA) H (T n 

l T (T n 
= T 1 

V 



(23) 



where we used the fact that all elements of A have unit 
amplitude. This equation describes an image where all values 
are equal to the average thermal noise per baseline. 

If the imaging process involves deconvolution, the result 
is described by (l22l . For simplicity we will assume that we 
have an array of identical elements, so that we can set G = 
I. Further, to illustrate the effect, we momentarily omit the 
correction by E„ in d22l . In this case, the expected value of 
the image is 



-H 



A" A A H A 



A H A 



02 



(Ao A 



(A Q A) H cr 



H 



A H Af 2 Y\l T a n )l. 



(24) 



In this case, the homogeneity of the thermal noise distribution 

|A ff A ) 

being constant. If this is true, the model based image using 
( 1221 is analogous to the beamformed image based on ( TT31 . A 
special case is the situation in which the columns of A are 
orthonormal. 

Otherwise, the structure is more complicated. This is il- 
lustrated in Fig. [2] which compares the noise distribution in 
the image plane of the 8x8 URA by assuming R = 0.11 
with the corresponding image for a five armed array, each 
arm being an eight element half wavelength spaced Uniform 
Linear Array (ULA). The impact of the redistribution of noise 
can be reduced by estimating the receiver noise powers and 
subtracting these estimates from the array covariance matrix as 
described by ( 1221 . In most astronomical imaging algorithms, 
the autocorrelations are generally ignored completely thus 
effectively introducing a small negative system noise since the 
autocorrelations represent the power sum of the source signals 
and the noise. 
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Fig. 2. (a) Imaging with deconvolution using an 8 X 8 half wavelength 
spaced array for a Nyquist sampled image assuming R = 0.11 (an empty 
field with only thermal noise), (b) Imaging result for a five armed array, each 
arm being an eight element half wavelength spaced ULA. 



C. Deconvolution matrix condition number 

The deconvolution matrix M not only causes a redistribu- 
tion of noise over the map, but also determines whether the 
deconvolution is a well conditioned problem. If the deconvo- 
lution matrix is not invertible, the problem is ill-posed and 
additional constraints are required to obtain a unique solution. 
Different choices for these constraints or even the rigor with 
which they are applied, lead to different imaging results for 
CLEAN and MEM based on the same data. In some cases, this 
may even lead to different interpretation of the final maps 1 14 1. 
These problems arise due to over-interpretation of the data 
by allowing for more image points (parameters) than can be 
justified by the data. In these situations, the condition number 
of the deconvolution matrix will be infinitely large. Even if 
the deconvolution matrix is invertible, its condition number 
may be unacceptably high in view of the SNR of the data: 
the condition number is a measure for the magnification of 
measurement noise [23 1. The condition number thus provides 
a powerful diagnostic tool to assess the feasibility of the 
deconvolution problem at hand. 

It is instructive to analyze a half wavelength spaced ID ULA 



with identical elements, i.e. with G = I, sampling the sky on 
a regular grid. In this case A represents a Fourier transform 
mapping the spatial frequencies on the sky to the spatial 
samples describing the electromagnetic field over the array 
aperture. As demonstrated in the previous section, these spatial 
frequencies will be convolved in the imaging process with the 
Fourier transform of the array aperture taper or voltage beam 
pattern, which can be easily calculated for 1 = 0: 



by (0) = A H a(0) = FT 



0„ 



(25) 



Here FT denotes the Fourier transform, n is the total number 
of image points, p is the number of elements in the array and 
p and l p denote p x 1 vectors containing zeros and ones 
respectively. The corresponding power beam pattern is 

b P (0) = b v (0)0 by (0) 



FT 



0, 



© 



0, 



(26) 



If the columns of A are ordered such that they describe the 
array response vectors for the regularly spaced DOAs starting 
with L = 0, it is easily seen that 



M = A H A A H A = circulant (b P (0)) . 



(27) 



i.e., that the deconvolution matrix for a ID ULA equidistantly 
sampling the image plane is a circulant matrix. Since M is 
a circulant matrix, its eigenvalues A = [Ai, A2, • • • , A„] T are 



given by the Fourier transform of bp(0) 1241 . or 
A = JT(b P (0)) 

= ft (ft 
1 



o„ 



© 



On— p 

lp ' 

On — p 



On — p 



(28) 



since FT{) = Fl (■) for real symmetric functions. 

For Hermitian matrices, the condition number k is given 
by the ratio of the largest and smallest eigenvalue, i.e. k = 
^max/^min l24ll . If the image plane is Nyquist sampled, n = 
2p — 1 and 

A= [1,2,--- ,p-l,p,p-l,--- ,2,1] T . (29) 
In this case the condition number of M is 

\ma X P „„ 

k = = — = p, (30) 

thus M is invertible. The deconvolution problem is therefore 
well-posed and has a unique solution. 

If the image plane is undersampled with n < 2p— 1 samples, 
then 



A = 



n-1 



,P~ 1»P,P- !»•• ■ >P- 



n-1 



(31) 

and k = 9p_(f t _ 1 ) ■ The deconvolution problem in itself is 
thus well-posed and has a unique solution. However, from 
Fourier theory we know that aliasing effects may occur due 
to undersampling. 
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Fig. 3. This plot shows the condition number of the deconvolution matrix 
as function of the image resolution for the 8x8 half wavelength spaced 
array and the five armed array with each arm being an eight element half 
wavelength spaced ULA. 



If the image plane is oversampled with n > 2p — 1 samples, 
then 



A = 



,0,1,2,-- ■ ,p-l,p,p-l,--- ,2,1,0, • 



(32) 



and k = oo. In this case, the deconvolution problem is ill- 
posed and thus not solvable without introducing additional 
boundary conditions to constrain the problem. 

This analysis shows that, for a ID ULA, the condition 
number slowly increases up to Nyquist sampling of the image 
plane and then jumps to infinity. Since a URA is just the 
2D analog of a ID ULA, this behavior is also expected 
for the 8x8 URA introduced earlier. This conjecture is 
confirmed in Fig. [3] which shows the condition number of the 
deconvolution matrix as function of image resolution. This 
figure also shows the corresponding curve for the five armed 
array introduced earlier to demonstrate the impact of less 
regular and sparser sampling of the array aperture. Although 
the array diameter is nearly twice as large, it does not provide 
twice the resolution due to sparser sampling of the aperture 
plane. This plot also demonstrates that a less regular array also 
may have a less strict cut-off: the transition of the condition 
number from small values to infinity is a gradual one. For array 
processing problems, this means that the user should decide 
which value of the condition number (or noise enhancement) 
is still acceptable. 

Regularization is commonly used to avoid uninvertability of 
matrices. In radio astronomical imaging where most sources 
have a low SNR, this would lead to imperfect deconvolution 
causing the weakest sources in the field to be drowned in the 
imperfectly removed array response pattern of the strongest 
sources. However, several forms of implicit regularization have 
been studied to handle special cases like strong interference 

ma. 

IV. Effective noise 

Equation d22l shows that calibration and imaging are 
strongly coupled. Knowledge of the instrumental parameters is 



required to obtain the proper image. People have approached 
this problem in two ways. In the first approach calibration and 
imaging are treated as separate steps, i.e., the instrumental 
parameters are estimated first by a calibration measurement 
and consecutively applied to the actual measurement data. 
The second approach is self calibration which regards the 
estimation of instrumental and image parameters as a single 
parameter estimation problem |[T3l . 1261-1281. 

In either case the achievable dynamic range is limited by the 
combination of estimation errors, thermal noise and confusion 
noise. Together, they determine the effective noise in the image 
which need not be homogeneous over the field of interest. In 
this section a number of analytical expressions are derived 
that describe these contributions in terms of the data model 
presented in Section [TTJ The implications will be discussed in 
Section [V] 



A. Noise in self calibrated images 

In self calibration the instrumental and image parameters are 
estimated simultaneously. Self calibration based on the data 
model presented above can thus be described as simultaneous 
estimation of the omni-directional complex gains, the apparent 
source powers, the source locations and the receiver noise 
powers, i.e., of a parameter vector 



9 



= hi, 



,Jp,<t>2, 



• <j>p, 02) ' 
lT 

m q \ . 



; nli ' " 



np J 



In this parameter vector, <f>\ and af are omitted because they 
are set to constants for the problem to be identifiable. Indeed, 
the restriction a\ = 1 is imposed by the fact that G and X 
share a common factor, while the first constraint is required 
since one can only measure the gain phases with respect to 
some reference, here achieved by setting c/>i = oQ 

Similarly to ( fl9l ), the parameters are obtained by solving 



9 = argmin || W c 
9 



R GAS AG 



2 



(33) 



where G, A, X and X n are all functions of 9 and W c = 
R~2 « —I as argued earlier. 

The minimum variance for an unbiased estimator is given by 
the Cramer-Rao Bound (CRB). The CRB on the error variance 
for any unbiased estimator states that the covariance matrix Cg 
of the parameter vector 9 satisfies [30 1 



c e = £Ue-e) (e-e 



>lj 

- N 



(34) 



where J is the Fisher information matrix (FIM). For Gaussian 
data models J can be expressed as (e.g. IBD ) 



j = f h (r ^r-^f 



(35) 



1 In 1291 it is shown that X/f=i 0i = is the optimal constraint for this 
problem. This constraint has the disadvantage that the location of the phase 
reference is not well defined. Furthermore, the choice for the constraint used 
here simplifies our analysis in combination with the constraints required to 
uniquely identify the source locations and the apparent source powers. 
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where R is the data covariance matrix and F is the Jacobian 
evaluated at the true values of the parameters, i.e., 



<5vec(R) 



50 1 



e' 



(36) 



For the self calibration scenario, the Jacobian can be parti- 
tioned into six parts following the structure of 9: 



(37) 



By substitution of ([8} in ( T36T ), it follows directly that the first 
four components can be expressed as 



GR * ol + lo GR * 



( 38 ) 

j ( (GRo"G) o I I o (GR G) ) I s (39) 



((GA)o(GA)) I s 
Iol 



(40) 
(41) 



where Ro = ASA ff and I s is a selection matrix of appro- 
priate size equal to the identity matrix with its first column 
removed so that the derivatives with respect to <\>\ and a\ are 
omitted. 

If the receiver noise powers of all p elements are the same, 
the expression for F CTn given in (Put should be replaced by 



vec(I) 



(42) 



For the last two components of the FIM, derivatives of 
vec (R) with respect to the source position coordinates are 
required. Let (xi,yi) be the coordinates of the ith array 
element, and introduce 



G x = diag( [2:1,2:2, 



G, 



diag([y 1,2/2, ■ ■ -y P ] )G 



(43) 
(44) 



then these components can be conveniently written as 

.2tt 



F, = -j — (GA o G, A - G x A o GA) SI S (45) 



A 

,27T 



F m = -j— (GA o G y A - G y A o GA) SI S (46) 
A 

These equations show that the entries of the Jacobian related 
to derivatives with respect to the I- and m-coordinates of 
the sources are proportional to the x- and y-coordinates of 
the array elements respectively. The physical interpretation 
of this relation is that a plane wave propagating along the 
coordinate axis of the coordinate to be estimated provides a 
more useful test signal to estimate the source location than a 
signal propagating perpendicular to this axis. 

The preceding equations allow us to compute Cg. The 
variance of the estimated image values, i.e. the noise on the 
image values due to estimation inaccuracy, is given by the 
diagonal of the sub-block C CT(T of this matrix, following the 
partitioning of 9. In general G aa is not a diagonal matrix. The 
other entries in this sub-block describe the way in which the 
noise on the pixels are correlated among themselves — this is 
associated with false structures. 



B. Propagation of calibration errors 

If the instrumental parameters are extracted from separate 
calibration data, the minimum variance on these estimated 
values is given by the CRB on the instrumental parame- 
ters in the calibration experiment, Cg, where now 9 = 
hi, ■ ■ ■ , 7 P , 02, • • • , <t> P , oil, • • • , o 2 n P ] T - With this choice for 
9 the results for F 7 , F^ and F an derived earlier in ( 1381 ). 
d39l and < f4TT > can be used assuming that the calibration 
measurement adheres to the same data model. The propagation 
of the calibration errors to the image is described by 



cov (i) 



/ di 



\d9 7 



di 
~d!f 



(47) 



We thus need to derive di/d~f T , di/d(p T and di/dcr^- 
The derivative of the image values to 7fe is defined as 



di 



d 



■P- = T ^M- 1 (GAoGA) ff vec(R-S„) (48) 

<?7fc o-fk 

where M and G depend on 7. Applying the formula for the 
derivative of an inverted matrix with respect to one of its 
elements [24|, this can be rewritten as 

di _ 

= ( - M- 1 (j^-m) M- 1 (GAoGAf 



M 



-1 



Ik 



(e-^EkkA o GA + GA o ^"E kk A 



H 



xvec (R — E n ) 



(49) 



where Efcfc is the elementary matrix with all its entries set 
to zero except element Ekk which is set to 1. Inserting the 
vectorized version of (O in d49l and removing the Khatri-Rao 
products, we obtain 



di 

d-fk 



= -2 7fc (2-7 fe )M- 1 Re 



A H r 2 Al>cr 



(50) 

We have introduced the notation Afe : = E^A, i.e. A& : has 
only zero valued entries except on the fcth row where the 
elements are equal to the corresponding elements of A. The 
goal of this derivation is to obtain an expression for di/dy r . 
We will thus have to stack the expression for di/djk in a 
single matrix. This is facilitated by introducing a& as the fcth 
row of A and rewriting d50] i as 



di 

djk 



= -2 7fe (2 - 7fc ) M^Re {a^l T A H r 2 ASaf } 



(51) 

where 1 denotes a vector of ones of appropriate size. 

By stacking all vector di/djk in a single matrix, we thus 
obtain 



di 

d^ 



= -2M -1 Re {A T A H r 2 ASA H } (21 - T) T. 



(52) 
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The corresponding result for fa can be derived in a similar 
way, so we only present the main steps. 



di 



(GA o GA) 1 vec (R) 



H 



MT 1 ( ^-GAoGAj ((GAoGA) fl (T 



M 



-je" J0fc r A k: o G A + je J0fc G A o T A fc: 



H 



H 



x GAoGA <r 



(53) 



Removal of the Khatri-Rao products by reducing them to 
Hadamard products gives 

^- = -2 7 , 2 M- 1 Im{(AfA^0A H r 2 A)} CT (54) 



Note that this term has the same form as the first term in 
( T50t . so it can be rewritten in a similar way. This gives 

^- = -2 7 2 M- 1 Im {a£l T A ff T 2 ASaf } (55) 

ofa 



and therefore 

m 



d^ 1 



= -2M- 1 Im|A H A H r 2 ASA ff |r 2 (56) 



Finally, the partial derivative of the image values with 
respect to (cr^k) ' s gi yen by 



di 



d 



, r , - , ,-M- 1 (GAoGA)%ec(R-£ n ) 



= M-^GAoGA) vec(-E fefe ) 



M- L [\GAD vecdiag(E fcfc ). (57) 



Therefore 



gi 



M 



1 (|GA| 02 ) 



H 



If £„ = ofjl this reduces further to 



«-^(|GA)«)'l 



(58) 



(59) 



The partial derivatives as well as the CRB |fl9| contain 
terms involving A H A, often weighted by the gains of the 
receiving elements. Given the physical interpretation of this 
factor discussed in section [TTTJ this suggests that the error 
patterns introduced in the image by calibration errors follow 
the structures in the dirty image. This is confirmed by the 
example in section [V] Since the CRB is inversely proportional 
with N, which is equal to the product of bandwidth and 
integration time, the image covariance due to calibration errors 
decreases proportional to bandwidth and integration time. 



C. Thermal noise 

In this section we derive an expression for the covariance 
of the image values due to the thermal noise in the data. We 
will therefore assume that perfect knowledge of the thermal 
noise power E„ is available to avoid confusion between the 
thermal noise contribution and the contribution of propagated 
estimation errors. The covariance of the image values is by 
definition given by 



cov (i) 



H 



— £ | (vec (ij — vec (i)^ (vec (Cj — vec (i) 
= S | (GA o GA) 1 (vec (R - £„) - vec (R - S n ) 
x (vec (R - £„) - vec (R - " 
x(GAoGA) tff |. (60) 

This shows that under the assumption that perfect knowledge 
on S„ in R is available, S„ drops out. Furthermore, (GA o 
GA)' can be moved outside the expectation operator, since it 
contains no estimated values. Therefore 



cov (i) = IVr 1 (GA o G A) H cov (R) (GA o GA) M" 1 . 

(61) 

For Gaussian data models 



cov (R) = — (R <gs R) , 



(62) 



and we find that 



cov (i) 



1 



= ^M" 1 (GA o GA) H (R« R) (GAo GA) M" 1 . 

This can be rewritten using Kronecker and Khatri-Rao product 
relations as 



1 



cov(i) = —Mr 1 (GA o GA) H (RGA o RGA) M 



N 

= Im-Ma^g^rgaI^m- 1 . 

N 1 1 



(63) 



Finally, substituting the data model presented in © we get 

cov(i) = 



= —Mr 1 1 A ff r 2 ASA H r 2 A + A ff r 2 s„A| 02 m _1 



N 



(64) 



It is interesting to note that for an array having G = I a 
diagonalization of A H A does not only ensure a homogeneous 
noise distribution over the map after the deconvolution oper- 
ation as demonstrated in Sec. IIII-BI but also diagonalizes the 
image covariance due to thermal noise, thus ensuring that the 
noise on the pixels is uncorrelated. The Gram matrix A H A 
describes the amount of linear independence (or orthogonality) 
of the direction of arrival vectors within the field of view 
of the array, which can be visualized as the array beam 
pattern. This observation therefore suggests that an array 
with a low side lobe pattern does not only provide good 
spatial separation between source signals, but also gives small 
covariance between image values after deconvolution. 
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D. Confusion noise 

The contributions to the effective noise from calibration 
errors and thermal noise scale inversely with the number 
of samples N, which is equal to the product of bandwidth 
and integration time. This implies that, theoretically, these 
sources of image noise can be reduced to arbitrarily low 
levels. In practice, the radio astronomical array will detect 
more sources with every reduction of the noise in the map. At 
some point, the number of detected sources becomes larger 
than the number of resolution elements in the image, which 
will turn the map into one blur of sources. The maximum 
density of discernable sources is the classical confusion limit 
and relates to the resolution of the image. 

In terms of self calibration, to have more detectable sources 
requires more source parameters to describe the source model. 
At some point, the self calibration problem becomes ill-posed. 
We will refer to this as the self calibration confusion limit. 
Although the exact limit depends on the minutiae of the array 
and source configuration, we can easily compute an upper limit 
on the tractable number of sources based on the argument that 
the number of unknowns should be smaller than the number 
of equations. The data model provides a relation between the 
parameters and the data. For a p element array, the covariance 
matrix contains p 2 independent real values, so the data model 
can be regarded as p 2 independent equations. Solving for 
the direction independent complex gains requires 2p — 1 real 
valued parameters, estimation of the receiver element noise 
powers requires another p parameters and the q sources are 
described by 3q — 1 parameters (the apparent source powers 
relative to the first source and two coordinates per source). 
The self calibration problem is therefore constrained by 

P 2 > 2p - 1 + p + 3q - 1 = 3p + 3q - 2 (65) 



implying that 



q< 



3p + 2 



(66) 



The spatial Nyquist sampling with the 8x8 URA allows 
an image grid of 15 x 15 = 225 image values. This resolution 
was confirmed by the condition number analysis presented 
in Fig. |3] However, the upper limit based on the analysis 
above for a 64 element array is 1302. The mismatch between 
this upper limit and the actual number of uniquely solvable 
image values can be attributed to the redundancy in the array. 
Due to this redundancy the cross-correlations of many antenna 
pairs provide the same spatial information instead of providing 
additional information on the spatial structure of the sky. In 
terms of the argument leading to Eq. ( l66l l. there is linear 
dependence between the equations and therefore the number 
of equations that can be used to solve parameters is reduced. 
The 5-armed array performs much better in this regard. E.g., 
for p = 40 the upper limit on the number of sources given by 
(l66l l is 494. Since V4~94 rj 22, we can thus form an image grid 
of 22 x 22 points, thus providing a resolution of Al ps 0.09. 
Figure [3] shows that the condition number for this array goes 
to numerical infinity at Al s» 0.08, showing that the 5-armed 
array approaches its theoretical self calibration confusion limit. 
This example illustrates that if processing power is cheap com- 
pared to antenna hardware, a non-redundant array should be 
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Fig. 4. (a) The logarithm (base 10) of the image covariance matrix due to 
calibration errors and (b) due to the measurement noise on the same color 
scale. The former is almost everywhere two orders of magnitude lower. 



preferred over a redundant array if the confusion limit should 
be reached without introducing an ill-posed deconvolution 
problem. 

In this section we addressed the classical confusion limit 
for pure imaging problems and the self calibration confusion 
limit in self calibration problems. This type of confusion is 
often called source confusion as opposed to side lobe confusion 
which refers to blurring of the image by side lobe leftovers 
introduced in the CLEAN process. In the analysis of this 
paper, side lobe confusion is part of the deconvolution problem 
and is thus intrinsically included in the analysis of calibration 
error and thermal noise propagation, and does not need to be 
addressed separately. Source confusion does require a separate 
treatment because it involves the source density distribution as 
function of source brightness. 

V. Implications 

A. Thermal noise vs. propagated calibration errors 

We compare the image covariance due to calibration errors 
to the image covariance due to the noise on the data in a 
simulation. The calibration parameters are calculated from a 
separate data set with the same data model and integration 
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o self calibration 

+ calibration observation 




parameter index 

Fig. 5. The CRB for the omni-directional complex gain amplitudes 
(parameters 1 through 64) and phases (parameters 65 through 127) for a 
separate calibration observation and the self calibration approach. 

time. We computed the CRB for the 8 x 8 URA using 
the relations presented in Section IIV-AI for the simultaneous 
solution of the omni-directional complex gains, G, and the 
system noise power, cr^, which was assumed to be the same 
for all array elements, assuming a short term integration over 
N = 16384 samples. This CRB was used in d47b to compute 
the image covariance matrix due to calibration errors. The 
magnitudes of the matrix entries are shown in Fig. H|a), m a 
log scale. 

The image covariance matrix due to the noise in the 
measurement was calculated using d64l and is shown in Fig. 
Sffe). This shows that the covariance of the image values due 
to the calibration errors is more concentrated at the source 
locations than the covariance due to the system noise, but is 
generally more than two orders of magnitude lower. These 
results indicate that the calibration errors only represent a 
minor contribution to the total effective noise, even when 
the calibration measurements are done on the same (short) 
time scales using sources of the same strength, i.e. when the 
calibration measurement is similar to the actual measurement. 

B. Calibration observations vs. self calibration 

In the previous section we discussed the situation in which 
the array is calibrated in a separate measurement. This scheme 
requires an extremely stable instrument. In most practical 
applications, the calibration is therefore done on the same data 
that is also used to provide the final image (self calibration). 
It is interesting to see how these scenarios compare. To this 
end, we used the relationships presented in Section IIV-AI to 
compute the CRB for simultaneous estimation of the omni- 
directional complex gains, the apparent source powers, the 
source locations and the system noise power for the 8x8 
URA. 

Figure [5] compares the CRBs for these two cases. The 
expected covariance of the gains and phases in the self 
calibration experiment is higher since more parameters have to 
be estimated simultaneously. The behavior of the CRB on the 



TABLE I 

Covariance of source power estimates. 



self calibration 



index 


2 




3 




4 
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0.266 x 1CT 4 


0.287 x 10" 


4 


0.022 x 10- 


-4 


3 


0.287 x 10~ 4 


1.237 x 10" 


4 


0.048 x 10- 


-4 


4 


0.022 x 10" 4 


0.048 x 10" 


4 


0.007 x 10" 


-4 


separate calibration 


index 


2 




3 




4 


2 


0.280 x 10~ 4 


0.072 x 10~ 


4 


0.006 x 10- 


-4 


3 


0.072 x 10~ 4 


0.772 x 10" 


4 


0.012 x 10- 


-4 


4 


0.006 x 10" 4 


0.012 x 10- 


4 


0.005 x 10" 


-4 



phases in the self calibrated observation (sloped upwards for 
increasing parameter index) can be explained by the interaction 
between the source parameters and the gain phases combined 
with the choice of the phase reference element in the corner 
of the array. 

Table |T] shows, for each of the two cases, the covariance 
matrices of the apparent source powers, i.e., the variance of 
the image values at the locations of the sources. The scaling 
factor ambiguity between G and S in the self calibration 
case is resolved by putting a\ = 1 to constrain the problem, 
and therefore only the covariance values of the other three 
sources is tabulated. For the case with a separate calibration 
observation the covariance matrix was extracted from the sum 
of the image covariances due to calibration errors and system 
noise. The results in the table indicate that the variance of the 
source power estimates in both cases are comparable, although 
the source power estimates are slightly better when gain 
calibration data is available from a separate measurement. The 
covariance values found for a separate calibration stage are 
much lower than the corresponding values for self calibration. 
This suggests that pure imaging is more capable of separating 
source signals from different directions than self calibrated 
imaging. 

VI. Conclusions 

In this paper we presented an analytic solution for snapshot 
imaging including deconvolution based on a data model (mea- 
surement equation) for the antenna signal covariance matrix 
or visibilities. The presented comprehensive framework is 
sufficiently flexible to enable extension of this analysis to 
synthesis observations, since the data model for a synthesis 
observation has the same form lfTT1 - lfT3l . This framework 
allowed us to make the first complete rigorous assessment of 
the effective noise floor, which is the combined effect of prop- 
agated calibration errors, thermal noise and source confusion, 
in the image in terms of the covariance of the image values. 
Our simulations for a 2D uniform rectangular array indicate 
that the effect of propagated calibration errors is strongly 
concentrated at the source locations but is considerably smaller 
than the thermal noise at other image points. The results also 
suggest that if the instrument is sufficiently stable, a separate 
calibration step is to be preferred over a self calibrated image 
since it allows better source separation in the imaging process. 
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The effects of deconvolution can be described by a deconvo- 
lution matrix that describes the amount of linear independence 
(orthogonality) of the spatial signature vectors weighted by the 
actual gains of the receiving elements. A diagonal deconvolu- 
tion matrix not only ensures the best possible spatial separation 
between the sources, but also ensures a homogeneous noise 
distribution over the map. This poses the question whether 
this matrix can be diagonalized by array design or by applying 
appropriate weights to the array elements. Since this factor is 
related to the array beam pattern, the latter is equivalent to 
finding weights that suppress the side lobe patterns at least in 
the direction of other sources, which suggest that techniques 
like Robust Capon Beamforming should provide the requested 
weighting ll25l . The condition number of the deconvolution 
matrix can be used to assess the quality of the solution to the 
deconvolution problem. 

Compared to a redundant array (ULA, URA), an array 
without redundant element spacings provides much better 
possibilities to approach the maximum number of solvable 
image points for a fixed number of antenna elements, thereby 
allowing the system to reach the theoretical self calibration 
confusion limit. 
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